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Abstract: In this paper, we propose a new 
method based on Hidden Markov Models to inter- 
pret temporal sequences of sensor data from mo- 
bile robots to automatically detect features. Hid- 
den Markov Models have been used for a long 
time in pattern recognition, especially in speech 
recognition. Their main advantages over other 
methods (such as neural networks) are their abil- 
ity to model noisy temporal signals of variable 
length. We show in this paper that this approach 
is well suited for interpretation of temporal se- 
quences of mobile-robot sensor data. We present 
two distinct experiments and results: the first 
one in an indoor environment where a mobile 
robot learns to detect features like open doors or 
T -intersections, the second one in an outdoor en- 
vironment where a different mobile robot has to 
identify situations like climbing a hill or crossing 
a rock. 

Keywords: sensor data interpretation, Hidden 
Markov Models, mobile robots 
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1 Introduction 

A mobile robot operating in a dynamic environ- 
ment is provided with sensors (infrared sensors, 
ultrasonic sensors, tactile sensors, cameras. . . ) 
in order to perceive its environment. Unfor- 
tunately, the numeric, noisy data furnished by 
these sensors are not directly useful; they must 
first be interpreted to provide accurate and us- 
able information about the environment. This 
interpretation plays a crucial role, since it makes 
it possible for the robot to detect pertinent fea- 
tures in its environment and to use them for var- 
ious tasks. 

For instance, for a mobile robot, the automatic 
recognition of features is an important issue for 
the following reasons: 

1 . For successful navigation in large-scale envi- 
ronments, mobile robots must have the ca- 
pability to localize themselves in their en- 
vironment. Almost all existing localization 
approaches jS] extract a small set of fea- 
tures. During navigation, mobile robots de- 
tect features and match them with known 
features of the environment in order to com- 
pute their position; 
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2. Feature recognition is the first step in the 
automatic construction of maps. For in- 
stance, at the topological level of his "spa- 
tial semantic hierarchy" system, Kuipers 
|15j incrementally builds a topological map 
by first detecting pertinent features while 
the robot moves in the environment and 
then determining the link between a new 
detected feature and features contained in 
the current map; 

3. Features can be used by a mobile robot as 
subgoals for a navigation plan |16j . 

In semi-autonomous or remote, teleoperated 
robotics, automatic detection of features is a nec- 
essary ability. In the case of limited and delayed 
communication, such as for planetary rovers, hu- 
man interaction is restricted, so feature detec- 
tion can only be practically performed through 
on-board interpretation of the sensor informa- 
tion. Moreover, feature detection from raw sen- 
sor data, especially when based on a combination 
of sensors, is a complex task that generally can- 
not be done in real time by humans, which would 
be necessary even if teleoperation were possi- 
ble given the communication constraints. For 
all these reasons, feature detection has received 
considerable attention over the past few years. 
This problem can be classified with the follow- 
ing criteria: 

Natural/ artificial The first criterion is the 
nature of the feature. The features can be ar- 
tificial, that is, added to the existing environ- 
ment. Becker et al jl] define a set of artificial 
features 2 located on the ceiling and use a cam- 
era to detect them. Other techniques use nat- 

2 The features are patterns composed of 3x3 squares, 
and each square is colored in black or white 



ural features, that is, features already existing 
in the environment. For instance, Kortenkamp, 
Baker, and Weymouth use ultrasonic sen- 
sors to detect natural features like open doors 
and T-intersections. 

Using artificial features makes the process of 
detection and distinction of features easier, be- 
cause the features are designed to be simple 
to detect. But this approach can be time- 
consuming, because the features have to be de- 
signed and to be positioned in the environment. 
Moreover, using artificial features is impossible 
in unknown or remote environments. 

Analytical/statistical methods Feature de- 
tection has been addressed by different ap- 
proaches such as analytical methods or pattern 
classification methods. In the analytical ap- 
proach, the problem is studied as a reasoning 
process. A knowledge based system uses rules 
to build a representation of features. For in- 
stance, Kortenkamp, Baker, and Weymouth ^3] 
use rules about the variation of the sonar sen- 
sors to learn different types of features and adds 
visual information to distinguish two features of 
the same type. In contrast, a statistical pattern- 
classification system attempts to describe the ob- 
servations coming from the sensors as a random 
process. The recognition process consists of the 
association of the signal acquired from sensors 
with a model of the feature to identify. For in- 
stance, Yamauchi j^S] uses ultrasonic sensors to 
build evidence grids [8]. An evidence grid is a 
grid corresponding to a discretization of the lo- 
cal environment of the mobile robot. In this grid, 
Yamauchi's method updates the probability of 
occupancy of each grid tile with several sensor 
data. To perform the detection, he defines an 
algorithm to match two evidence grids. 
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These two approaches are complementary. In 
the analytical approach, we aim to understand 
the sensor data and build a representation of 
these data. But as the sensor data may be noisy, 
so their interpretation may not be straightfor- 
ward; moreover, overly simple descriptions of the 
sensor data (e.g., "current rising, steady, then 
falling") may not directly correspond to the ac- 
tual data. 

In the second approach, we build models that 
represent the statistical properties of the data. 
This approach naturally takes into account the 
noisy data, but it is generally difficult to under- 
stand the correspondence between detected fea- 
tures and the sensor data. 

A solution that combines the two approachs 
could build models corresponding to human's 
understanding of the sensor data, and adjust 
the model parameters according to the statistical 
properties of the data. 

Automatic/manual feature definition. 

The set of features to detect could be given 
manually or discovered automatically [22]. In 
the manual approach, the set is defined by 
humans using the perception they have of the 
environment. Since high level robotic system 
are generally based loosely on human percep- 
tion, the integration of feature detection in 
such a system is easier than for automatically- 
discovered features. Moreover, in teleoperated 
robotics, where humans interact with the robot, 
the features must correspond to the high level 
perception of the operator to be useful. These 
are the main reasons the set is almost always 
defined by humans. However, properly defining 
the features so that they can be recognized 
robustly by a robot remains a difficult problem; 
this paper proposes a method for this problem. 



In contrast, when features are discovered auto- 
matically, humans must find the correspondence 
between features perceived by the robot and 
features they perceive. The difficulty now rests 
on the shoulders of the humans. 

Temporally extended/instantaneous fea- 
tures. Some features can only be identified by 
considering a temporal sequence of sensor infor- 
mation, not simply a snapshot, especially with 
telemetric sensors. Consider for example the de- 
tection of a feature in or the construction of 
an evidence grid in [22]: these two operations use 
a temporal sequence of sensor information. In 
general, instantaneous (i.e., based over a simple 
snapshot) detection is less robust than temporal 
detection. 

This paper describes an approach that com- 
bines an analytical approach for the high-level 
topology of the environment with a statistical 
approach to feature detection. The approach is 
designed to detect natural, temporally extended 
features that have been manually defined. The 
feature detection uses Hidden Markov Models 
(HMMs). HMMs are a particular type of prob- 
abilistic automata. The topology of these au- 
tomata corresponds to a human's understanding 
of sequences of sensor data characterizing a par- 
ticular feature in the robot's environment. We 
use HMMs for pattern recognition. From a set 
of training data produced by its sensors and col- 
lected at a feature that it has to identify — a 
door, a rock, ... — the robot adjusts the param- 
eters of the corresponding model to take into ac- 
count the statistical properties of the sequences 
of sensor data. At recognition time, the robot 
chooses the model whose probability given the 
sensor data — the a posteriori probability — is 
maximized. We combine analytical methods to 
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define the topology of the automata with statis- 
tical pattern-classification methods to adjust the 
parameters of the model. 

The HMM approach is a flexible method for 
handling the large variability of complex tem- 
poral signals; for example, it is a standard 
method for speech recognition ^Jj. In contrast 
to dynamic time warping, where heuristic train- 
ing methods for estimating templates are used, 
stochastic modeling allows probabilistic and au- 
tomatic training for estimating models. The 
particular approach we use is the second-order 
HMM (HMM2), which have been used in speech 
recognition |17j , often out-performing first-order 
HMMs. 

This paper is organized as follow. We first 
define the HMM2 and describe the algorithms 
used for training and recognition. Section |3] 
is the description of our method for feature 
detection combining HMM2s with a grammar- 
based analytical method describing the environ- 
ment. In section |11 we present an experiment of 
our method to detect natural features like open 
doors or T-intersections in an indoor structured 
environment for an autonomous mobile robot. A 
second experiment on a semi- autonomous mobile 
robot in an outdoor environment is described in 
section |21 Then we report related work in sec- 
tion El We give some conclusions and perspec- 
tives in section |7| 

2 Second-order Hidden 
Markov Models 

In this section, we only present second-order 
Hidden Markov Models in the special case of 
multi dimensional continuous observations (rep- 
resenting the data of several sensors). We also 
detail the second-order extension of the learning 



algorithm (Viterbi algorithm) and the recogni- 
tion algorithm (Baum- Welch algorithm). A very 
complete tutorial on first order Hidden Markov 
Models can be found in Rabiner |19j . 

2.1 Definition 

In an HMM2, the underlying state sequence is a 
second-order Markov chain. Therefore, the prob- 
ability of a transition between two states at time 
t depends on the states in which the process was 
at time t — 1 and t — 2. 

A second order Hidden Markov Model A is 
specified by: 

• a set of N states called S containing at least 
one final state; 

• a 3 dimensional matrix Oj,-fc over S x S x S 

dijk = Prob(q t = s k /q t -i = Sj, q t -2 = s») (1) 
= Prob(q t = s k /q t -i = Sj,q t -2 = s», 

qt-3 = •••) 

with the constraints 

N 

a ijk = 1 with 1 < i < N , 1 <j < N 

k=l 

where qt is the actual state at time t ; 

• each state is associated with a mixture of 
Gaussian distributions : 

M 

bi(O t ) = Y c im M(Ot;fiim,T,i m ), (2) 

m=l 

M 

with Y c im = 1 

m=l 

where Ot is the input vector (the frame) at 
time t. The mixture of Gaussian distribu- 
tions is one of the most powerful probability 
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distribution to represent complex and multi- 
dimensional probability space. 

The probability of the state sequence 

Q = <71,<?2,---,<?T 

is defined as 

T 

Prob(Q) = Tv qi a giq2 a qt _ 2qt _ iqt (3) 

t=3 

where IT is the probability of state s, at time 
t = 1 and a« is the probability of the transition 
Si — ► Sj at time t = 2. 

Given a sequence of observed vectors 
O = 01,02, ■-,ot, the joint state-output 
probability Prob(Q,0/X), is defined as : 

Prob(Q,0/\) =li qi b qi {0 1 )a qiq2 b q2 {0 2 ) x (4) 

11^=3 a qt-2qt-iqtbq t {Ot) ■ 

2.2 The Viterbi algorithm 

The recognition is carried out by the Viterbi 
algorithm 9 which determines the most likely 
state sequence given a sequence of observations. 

In Hidden Markov Models, many state se- 
quences may generate the same observed se- 
quence O = o±,...,ot- Given one such output 
sequence, we are interested in determining the 
most likely state sequence Q = qi,...,qx that 
could have generated the observed sequence. 

The extension of the Viterbi algorithm to 
HMM2 is straightforward. We simply replace 
the reference to a state in the state space S by 
a reference to an element of the 2-fold product 
space S x S. The most likely state sequence 
is found by using the probability of the partial 
alignment ending at transition at times 



(t-l,t). 

$t(j,k) = Prob(qx, ...q t ~2, (5) 
qt-i = Sj,q t = s k , 
oi,...,o t /A) 

2<t<T, 1 < j, k < N. 

Recursive computation is given by equation 

5 t (j, k) = maxi<i< N [5 t -i{i, j) ■ a ijk ] ■ b k {O t ) (6) 
3 < t < T, 1 < j, k < N. 

The Viterbi algorithm is a dynamic program- 
ming search that computes the best partial state 
sequence up to time t for all states. The most 
likely state sequence qi,...,qx is obtained by 
keeping track of back pointers for each compu- 
tation of which previous transition leads to the 
maximal partial path probability. By tracing 
back from the final state, we get the most likely 
state sequence. 

2.3 The Baum- Welch algorithm 

The learning of the models is performed by the 
Baum- Welch algorithm using the maximum like- 
lihood estimation criteria that determines the 
best model's parameters according to the cor- 
pus of items. Intuitively, this algorithm counts 
the number of occurrences of each transition be- 
tween the states and the number of occurrences 
of each observation in a given state in the train- 
ing corpus. Each count is weighted by the prob- 
ability of the alignment (state, observation). It 
must be noted that this criteria does not try 
to separate models like a neural network does, 
but only tries to increase the probability that 
a model generates its corpus independently of 
what the other models can do. 

Since many state sequences may generate a 
given output sequence, the probability that a 
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model A generates a sequence o\,..., ot is given by 1. Initialization 
the sum of the joint probabilities (given in equa- 
tion^ over all state sequences (i.e, the marginal Pr(hj) = 1 if 1 — h3 — N 
density of output sequences) . To avoid combina- 
torial explosion, a recursive computation similar 2. Recursion for 2 < t < T — 1 
to the Viterbi algorithm can be used to evaluate 

the above sum. The forward probability at(j, k) n ,. ., v-^ n /• ,\ r \ /-.^ 

. . V 7 = VA+iO,fc)-ayfc-&ifc(Ot+l) (10) 

lo • 



a t+ i(j,k) = prob( O x , O t = o x , o t , (7) 

= = Sfc/A). 



1 < i, j < N 



Given a model A and an observation sequence 



This probability represents the probability of °> we define ^{hj,k) as the probability of the 



starting from state and ending with the tran- 



transition Sj — > Sj — > s k between t—1 and t+1 



sition ( Sj , s k ) at time t and generating output durin g the emission of the observation sequence. 

o\,...,ot using all possible state sequences in be- . , s „, ,„ ,> 

tween. The Markov assumption allows the re- = ^ = *'* = ^ t+1 = A) ' 

cursive computation of the forward probability 2 < t < T — \ 



as : 

N 



We deduce: 



a t+1 (j,k) =^2a t (i,j).a ijk .b k (O t+ i), (8) a t {i,j)a ijk b k (O t+1 )(3 t+1 (j, k) 

i=i Vt{hJ,k)- , 

2<t<T-l, l<j,k<N v 1 ' r n \ 

2 < t < T — 1. 

This computation is similar to Viterbi decoding ~~ 

except that summation is used instead of max. As in the first order, we define £t(i,j) an d 

The value aT{j,k) where s k = N is the proba- 7t(«): 

bility that the model A generates the sequence N 

oi,...,oj. Another useful quantity is the back- £t(hj) = ^2i T 1t{hjik) 1 (12) 



fc=i 

IV 



ward function /3t(i,j), defined as the probability 

of the partial observation sequence from t + 1 to 

T, given the model A and the transition (si,Sj) lt(j>) = ^~]£t{hj)- (13) 

between times t — 1 and t, can be expressed as : ■ 5=1 

£t(hj) represents the aposteriori probability 
@t\hj) — Prob(Ot+i, ...Ot/ (9) that the stochastic process accomplishes the 

qt-i = Si,qt = Sj,X), transition Sj — > Sj between t—1 and £ assum- 

2 < t < T — 1, 1 < i, j < N. i n g the whole utterance. 

7t(i) represents the aposteriori probability 
The Markov assumption allows also the recursive that the process is in the state i at time t as- 
computation of the backward probability as : suming the whole utterance. 
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At this point, to get the new maximum like- 
lihood estimation (ML) of the HMM2, we can 
choose two ways of normalizing: one way gives 
an HMMi, the other an HMM 2 . 

The transformation in HMM\ is done by av- 
eraging the counts T]t(i,j,k) over all the states i 
that have been visited at time t — 1. 



A' 



(14) 



i=l 



is the classical first order count of transitions be- 
tween 2 HMM\ states between t and t + 1. 

Finally, the first-order maximum likelihood 
(ML) estimate of aTJfc is: 

nk £fc,t»7t(i,fc) Ei,k,t r lt(i,j,k)' 

This value is independent of i and can be written 
as djk- 

The second-order ML estimate of Ojjk is given 
by the equation: 

EtVt{i,j, k) 

= g£^±iMM . (i6) 

The ML estimates of the mean and covariance 
are given by the formulas: 

E*7t(i)Ot 



O'ijk 



£*7*« ' 

EtT%(*)(°* - tH)(P t - Mi 



3 Application 
robotics 



£*7t« 



to 



(17) 



(18) 



mobile 



The method presented in this paper performs 
feature detection by combining HMM2s with a 



grammar-based description of the environment. 
To apply second order Hidden Markov Models to 
automatically detect features, we must accom- 
plish a number of steps. In this section we re- 
view these steps and our approach for treating 
the issues arising in each of them. In the follow- 
ing sections we expand further on the specifics 
for each experiment. 

The steps necessary to apply HMM2s to detect 
features are the following: 

1. Defining the number of distinct features to 
identify and their characterization. 

As Hidden Markov Models have the ability 
to model signals whose properties change 
with time, we choose a set of sensors (as 
the observations) that have noticeable vari- 
ations when the mobile robot is observing 
a particular feature. The features are cho- 
sen for the fact that they are repeatable and 
human-observable (for the purposes of la- 
beling and validation). So, we define coarse 
rules to identify each feature, based on the 
variation of the sensors constituting the ob- 
servation to identify each feature. These 
rules are for human use, for segmentation 
and labeling of the data stream of the train- 
ing corpus. The set of chosen features is 
a complete description of what the mobile 
robot can see during its run. All other un- 
foreseen features are treated as noise. 

2. Finding the most appropriate model to rep- 
resent a specific feature. 

Designing the right model in pattern recog- 
nition is known as the model selection prob- 
lem and is still an open area of research. 
Based on our experience in speech recog- 
nition, we used the well known left-right 
model (figure which efficiently performs 
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Figure 1: Topology of states used for each model 
of feature 

temporal segmentation of the data. Recog- 
nition begins in the leftmost state, and each 
time an event characterizing the feature is 
recognized it advances to the next state to 
the right. When the rightmost state has 
been reached, the recognition of the feature 
is complete. 

The number of states is generally chosen as 
a monotone function of the length of the 
pattern to be identified according to the 
state duration probabilities. 

In the model depicted in figure ^ t ne dura- 
tion in state j may be defined as : 

dj(0) = 

dj(l) = a ijk , i^j^k 
dj(n) = (1 - a ijk ) ■ a^-j 2 • (1 - ajjj), 
n > 2. 

The state duration in a HMM2 is governed 
by two parameters: the probability of enter- 
ing a state only once, and the probability of 
visiting a state at least twice, with the latter 
modeled as a geometric decay. This distri- 
bution fits a probability density of durations 
[Hj better than the classical exponential dis- 
tribution of an HMM1. This property is of 
great interest in speech recognition when a 
HMM2 models a phoneme in which a state 
captures only 1 or 2 frames. 



This choice gives generally high rate of 
recognition. Sometimes, adding or sup- 
pressing one or two states has been exper- 
imentally observed to increase the rate of 
recognition. The number of states is gener- 
ally chosen to be the same for all the models. 

3. Collecting and labeling a corpus of sequence 
of observations during several runs to per- 
form learning. 

The corpus is used to adjust the parameters 
of the model to take into account the sta- 
tistical properties of the sequences of sensor 
data. Typically, the corpus consists of a set 
of sequences of features collected during sev- 
eral runs of the mobile robot. So, these runs 
should be as representative as possible of the 
set of situations in which features could be 
detected. The construction of the corpus is 
time-consuming, but is crucial to effective 
learning. A model is trained with sequences 
of sensor data corresponding to the partic- 
ular feature it represents. Since a run is 
composed of a sequence of features (and not 
only one feature), we need to segment and 
label each run. To perform this operation, 
we use the previously defined coarse rules 
to identify each feature and extract the rel- 
evant sequences of data. Finally, we group 
the segments of the runs corresponding to 
the same feature to form a corpus to train 
the model of that feature; 

4. Defining a way to be able to detect all the 
features seen during a run of the robot. 

For this, the robot's environment is de- 
scribed by means of a grammar that re- 
stricts the set of possible sequences of mod- 
els. Using this grammar, all the HMM2s 
are merged in a bigger HMM on which the 
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Viterbi algorithm is used. This grammar is 
a regular expression describing the legal se- 
quences of HMM2s; it is used to know the 
possible ways of merging the HMM2s and 
their likelihood. More formally, this gram- 
mar represents all possible Markov chains 
corresponding to the hidden part of the 
merged models. In these chains, nodes cor- 
respond to HMM2s associated with a par- 
ticular feature. Edges between two HMM2s 
correspond to a merge between the last state 
of one HMM2 and the first state of the 
other HMM2. The probability associated 
with each edge represents the likehood of 
the merge. 

Then, the most likely sequence of states, as 
determined by the Viterbi algorithm, deter- 
mines the ordered list of features that the 
robot saw during its run. It must be noted 
that the list of models is known only when 
the run is completed. We make the hypoth- 
esis that two or more of the features cannot 
overlap. The use of a grammar has another 
important advantage. It allows the elimina- 
tion of some sequences that will never hap- 
pen in the environment. From a computa- 
tional point of view, the grammar will avoid 
some useless calculations. 

The grammar can be given apriori or 
learned. To learn the grammar, we use the 
former models and estimate them on unseg- 
mented data like in the recognition phase. 
Specifically, we merge all the models seen by 
the robot during a complete run into a larger 
model corresponding to the sequence of ob- 
served items and train the resulting model 
with the unsegmented data. 

5. Evaluating the rate of recognition. 



For this, we define a test corpus composed 
of several runs. For each run, a human 
compares the sequence of features compos- 
ing the run, using knowledge of the environ- 
ment, with what has been detected by the 
Viterbi algorithm. A feature is recognized 
if it is detected by the corresponding model 
close to its real geometric position. A few 
types of errors can occur: 

Insertion: the robot has seen a non- 
existing feature (false positive). This 
corresponds to an over-segmentation in 
the recognition process. Insertions are 
currently considered when the width of 
the inserted feature is more than 80 
centimeters; 

Deletion: the robot has missed the feature 
(false negative); 

Substitution: the robot has confused the 
feature with another. 

In the experiments that we have run, the re- 
sults are summarized first as confusion ma- 
trices, where an element Cjj is the number 
of times the model j has been recognized 
when the right answer was feature i, and 
second with the global rate of recognition, 
insertion, substitution and deletion. 

In the two following sections, we present two ex- 
periments where we used second-order Hidden 
Markov Models to detect features using sequence 
of mobile-robot sensor data. In each section, af- 
ter a brief description of the problem and the 
mobile robot used, we explain the specific so- 
lution to each of the issues introduced in this 
section. 
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4 First experiment: Learning 
and recognition of features 
in an indoor structured envi- 
ronment 

In this first experiment, we used second order 
Hidden Markov Models to learn and to recog- 
nize indoor features such as T-intersections and 
open doors given sequences of data from ultra- 
sonic sensors of an autonomous mobile robot. 
These features are generally called places. 

4.1 The Nomad200 mobile robot 



turret 



base 




Figure 2: Our mobile robot 

In this experiment, we used a Nomad200 (fig- 
ure EJ manufactured by Nomadic Technologies 1 . 
It is composed of a base and a turret. The base 
consists of 3 wheels and tactile sensors. The 
turret is an uniform 16-sided polygon. On each 
side, there is an infrared and an ultrasonic sen- 
sor. The turret can rotate independently of the 
base. 
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Tactile Sensors: A ring of 20 tactile sensors 
surrounds the base. They detect contact with 
objects. They are just used for emergency situa- 
tions. They are associated with low-level reflexes 
such as emergency stop and backward move- 
ment. 



Ultrasonic Sensors: The angle between two 
ultrasonic sensors is 22.5 degrees, and each ultra- 
sonic sensor has a beam width of approximately 
23.6 degrees. By examining all 16 sensors, we 
can obtain a 360 degree panoramic view fairly 
rapidly. The ultrasonic sensors give range infor- 
mation from 17 to 255 inches. But the quality 
of the range information greatly depends on the 
surface of reflection and the angle of incidence 
between the ultrasonic sensor and the object. 



Infrared Sensors: The infrared sensors mea- 
sure the light differences between an emitted 
light and an reflected light. They are very sen- 
sitive to the ambient light, the object color, and 
the object orientation. We assume that for short 
distances the range information is acceptable, so 
we just use infrared sensors for the areas shorter 
than 17 inches, where the ultrasonic sensors are 
not usable. 



4.2 Specifics of HMM2 application to 
indoor place identification 

Here we discuss the specific issues arising from 
applying HMM2s to the problem of indoor place 
identification, along with our solutions to those 
issues. The numbering corresponds to the num- 
bering of the steps in section E3 
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Figure 3: The 10 models to recognize 

4.2.1 The set of places 

Currently, we model ten distinctive places that 
are representative of an office environment: a 
corridor, a T-intersection on the right (resp. left) 
of the corridor, an open door on the right (resp. 
left) of the corridor, a "starting" corner on the 
right (resp. left) when the robot moves away 
from the corner, an "ending" corner on the right 
(resp. left) side of the corridor when the robot 
arrives at this corner, two open doors across from 
each other (figure 0J) • This set of items is a com- 
plete description of what the mobile robot can 
see during its run. All other unforeseen objects, 
like people wandering along in a corridor, are 
treated as noise. 

To characterize each feature, we need to se- 
lect the pertinent sensor measures to observe a 
place. This task is complex because the sen- 
sor measures are noisy and because at the same 
time that there is a place on the right side of 
the robot, there is another place on the left side 
of the robot. For these reasons, we choose to 
characterize features separately for each side, us- 
ing the sensors perpendicular to each wall of 
the corridor and its two neighbor sensors (fig- 
ure These three sensors normally give valid 




Figure 4: The six sonars used for the character- 
ization on each side 

measures. Since all places except the corridor 
cause a noticeable variation on these three sen- 
sors over time, we define the beginning of a place 
on one side when the first sensor's measure sud- 
denly increases and the end of a place when the 
last sensor's measure suddenly decreases. Fig- 
ure El shows an example of the segmentation on 
the right side with these three sensors of a part of 
an acquisition corresponding to a T-intersection. 
The first line segment is the beginning of the T- 
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Figure 5: The characterization corresponding to 
a T-intersection on the right side of the robot 

intersection (sudden increase on the first sensor), 
and the second line segment is the end of the T- 
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intersection (sudden decrease on the third sen- 
sor). To the left of the first line and to the right 
of the second line are corridors. Figure shows 
the position of the robot at the beginning and at 
the end of the T-intersection and the measures 
of the three sensors used at these two positions 
for the characterization. Next, we must define 




Figure 6: The three sonars used for the segmen- 
tation of a T-intersection 

"global places" taking into account what can be 
seen on the right side and on the left side simulta- 
neously. To build the global places, we combine 
the 5 previous places observable on the right side 
with the 5 places observable on the left side. 

An example of the characterization of these 10 
places is given in figure [7| This characterization 
will be used for segmentation and labeling the 
corpus for training and evaluation. 

4.2.2 The model to represent each place 

In the formalism described in section |21 each 
place to be recognized is modeled by an HMM2 
whose topology is depicted in figure ^ 

As the robot is equipped with 16 ultrasonic 
sensors, the HMM2 models the 16-dimensional, 



real-valued signal coming from the battery of ul- 
trasonic sensors. 

4.2.3 Corpus collecting and labeling 



I 1 



Figure 8: The corridor used to make the learning 
corpus 

We built a corpus to train a model for each of 
the 10 places. For this, our mobile robot made 50 
passes (back and forth) in a very long corridor 
(approximately 30 meters). This corridor (fig- 
ure contains two corners (one at the start of 
the corridor and one at the end) , a T-intersection 
and some open doors (at least four, and not al- 
ways the same). The robot ran with a simple 
navigation algorithm F to stay in the middle 
of the corridor in a direction parallel to the two 
walls constituting the corridor. While running, 
the robot stored all of its ultrasonic sensor mea- 
sures. The acquisitions were done in real con- 
ditions with people wandering in the lab, doors 
completely or partially opened and static obsta- 
cles like shelves. 

A pass in the corridor contains not only one 
place but all the places seen while running in the 
corridor. To learn a particular place, we must 
manually segment and label passes in distinc- 
tive places. The goal of the segmentation and 
the labeling is to identify the sequence of places 
the robot saw during a given pass. To perform 
this task, we use the rules defined to character- 
ize a place. Finally, we group the segments from 
each pass corresponding to the same place. Each 
learning corpus associated with a model contains 
sequences of observations of the corresponding 
place. 
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Figure 7: Example of characterization of the 10 places 
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4.2.4 The recognition phase 

The goal of the recognition process is to iden- 
tify the 9 places in the corridor. We use a tenth 
model for the corridor because the Viterbi al- 
gorithm needs to map each frame to a model 
during recognition. The corridor model connects 
2 items much like a silence between 2 words in 
speech recognition. During this experiment, the 
robot uses its own reactive algorithm to navi- 
gate in the corridor and must decide which places 
have been encountered during the run. We took 
40 acquisitions and used the ten models trained 
to perform the recognition. The recognition is 
independently processed on each side. 

4.3 Results and discussion 

Results are given in table ^ and El 





number 


% 


Seen 


144 


100 


Recognized 


130 


90 


Substituted 


11 


9 


Deleted 


2 


1 


Inserted 


60 


42 



Table 2: Global rate of recognition 



We notice that the rate of recognition are very 
high, and the rate of confusion are very low. 
This is due to the fact that each place has a 
very particular pattern, and so it is very difficult 
to confuse it with an other. In fact, HMM2 used 
hidden characteristics (i.e, characteristics not ex- 
plicitly given during the segmentation and the 
labelization of places) to perform discrimination 
between places. In particular, a place is charac- 
terized by variations on sensors on one side of the 
robot, but too with variations on sensors located 
on the rear or the front of the robot. Observa- 



tions of sensors situated on the front of the robot 
are very different when the robot is in the middle 
of the corridor than at the end of the corridor. 
So, the models of start of corridor (resp. end of 
corridor) could be recognized only when observa- 
tions of front and rear sensors correspond to the 
start of a corridor (resp. the end of a corridor), 
which will rarely occur when the robot is in the 
middle of the corridor. So, it is nearly impossible 
to have insertions of the start of a corridor (resp. 
end of corridor) in the middle of the corridor. 

HMM2 have been able to learn this type of 
hidden characteristics and to use them to per- 
form discrimination during recognition. 

But, we see that T-intersection and open doors 
have very similar characteristics using sensor in- 
formation, and there is nearly no confusion be- 
tween these two places. An other characteristic 
has been learned by the HMM2 to perform the 
discrimination between these two places. The 
width of open doors is different from the width of 
intersections, the discrimination between these 
two types of places is improved because of the 
duration modeling capabilities of the HMM2, as 
presented above and as shown by [T7| . 

The rate of recognition of two open doors 
across from each other is mediocre (50%). There 
exists a great variety of doors that can overlap 
and we only define one model that represents all 
these situations. So this model is a very gen- 
eral model of two doors across from each other. 
Defining more specific models of this place would 
lead to increase the associate rate of recognition. 

The major problem is the high rate of in- 
sertion. Most of the insertions are due to the 
inaccuracy of the navigation algorithm and to 
the unexpected obstacles. Sometimes the mo- 
bile robot has to avoid people or obstacles, and 
in these cases it does not always run parallel to 
the two walls, and in the middle of the corri- 
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Total 
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8 


8 


46 


8 


7 


9 


46 


4 


60 


% reco. 


88 


88 


88 


91 


100 


86 


89 


93 


50 





Table 1: Confusion matrix of places 



dor. These conditions cause reflections on some 
sensors which are interpreted as places. A level 
incorporating knowledge about the environment 
should fix this problem. 

Finally, the global rate of recognition is 92%. 
Insertions of places are 42%. Deletions are at a 
very low probability level (less than 1.5%). 

5 Second experiment: Situa- 
tion identification for plan- 
etary rovers: Learning and 
Recognition 

In a second experiment, we want to detect par- 
ticular features (which we call situations) when 
an outdoor teleoperated robot is exploring an 
unknown environment. 

This experiment has three main differences 
with the previous one: 

1. the robot is an outdoor robot; 



2. the sensors used as the observation are of 
a different type than in the indoor experi- 
ment; 

3. we performed multiple learning and recogni- 
tion scenarios using different set of sensors. 
These experiments have been done to test 
the robustness of the detection if some sen- 
sors break down. 

5.1 Marsokhod rover 

The rover used in this experiment is a Mar- 
sokhod rover (see figureE}, a medium-sized plan- 
etary rover originally developed for the Russian 
Mars exploration program; in the NASA Mar- 
sokhod, the instruments and electronics have 
been changed from the original. The rover has 
six wheels, independently driven, 2 with three 
chassis segments that articulate independently. 

2 For the experiments, the right rear wheel had a bro- 
ken gear, so it rolled passively. 
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Figure 9: The Marsokhod rover 



It is configured with imaging cameras, a spec- 
trometer, and an arm. The Marsokhod platform 
has been demonstrated at field tests from 1993- 
99 in Russia, Hawaii, and deserts of Arizona and 
California; the field tests were designed to study 
user interface issues, science instrument selec- 
tion, and autonomy technologies. 

The Marsokhod is controlled either through 
sequences or direct tele-operation. In either case 
the rover is sent discrete commands that de- 
scribe motion in terms of translation and ro- 
tation rate and total time/distance. The Mar- 
sokhod is instrumented with sensors that mea- 
sure body, arm, and pan/tilt geometry, wheel 
odometry and currents, and battery currents. 
The sensors that are used in this paper are roll 
(angle from vertical in direction perpendicular 
to travel) , pitch (angle from vertical in direction 
of travel), and motor currents in each of the 6 
wheels. 

The experiments in this paper were performed 
in an outdoor "sandbox," which is a gravel and 



sand area about 20m x 20m, with assorted rocks 
and some topography. This space is used to per- 
form small-scale tests in a reasonable approxima- 
tion of a planetary (Martian) environment. We 
distinguish between the small (less than approx. 
15cm high) and large rocks (greater than approx. 
15cm high). We also distinguish between the one 
large hill (approx. lm high) and the three small 
hills (0.3-0. 5m high). 

5.2 Specifics of HMM2 application to 
outdoor situation identification 

Here we discuss the specific issues arising from 
applying HMM2s to the problem of outdoor sit- 
uation identification, along with our solutions to 
those issues. The numbering corresponds to the 
numbering of the steps in section |31 

5.2.1 The set of situations 

Currently, we model six distinct situations that 
are representative of a typical outdoor explo- 
ration environment: when the robot is climbing 
a small rock on its left (resp. right) side, a big 
rock on its left side, 3 a small (resp. big) hill, and 
a default situation of level ground. 

This set of items is considered to be a complete 
description of what the mobile robot can see dur- 
ing its runs. All other unforeseen situations, like 
flat rocks or holes, are treated as noise. 

One possible application of this technique 
would be to identify internal faults of the rover 
(e.g., broken encoders, stuck wheels). This 
would require instrumenting the rover to cause 
faults on command, which is not currently pos- 
sible on the Marsokhod. Instead, the situations 
used in this experiment were chosen to illustrate 

3 The situation of a big rock on the right side was not 
considered because of the non-functional right-side wheel. 
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the possibility of using a limited sensor suite to 
identify situations, and in fact some sensors were 
not used (such as joint angles) so that the prob- 
lem would become more challenging. 

As Hidden Markov Models have the ability 
to model signals whose properties change with 
time, we have to choose a set of sensors (as 
the observation) that have noticeable variations 
when the Marsokhod is crossing a rock or a hill. 
From the sensors described in section 15.11 we 
identified eight such sensors: roll, pitch, and the 
six wheel currents. We define coarse rules to 
identify each situation (used by humans for seg- 
mentation and labeling the corpus for training 
and evaluation): 

• When the robot crosses a small (resp. big) 
rock on its left, we notice a distinct sen- 
sor pattern. In all cases, the roll sensor 
shows a small (resp. big) increase when 
climbing the rock, then a small (resp. big), 
sudden decrease when descending from the 
rock. These two variations usually appear 
sequentially on the front, middle, and rear 
left wheels. The pitch sensor always shows 
a small (resp. big) increase, then a small 
(resp. big), sudden decrease, and finally a 
small (resp. big) increase. There is little 
variation on the right wheels. 

• When the robot crosses a small rock on its 
right side, we observe variations symmetric 
to the case of a small rock on the left side. 

• When the robot crosses a small (resp. big) 
hill, the pitch sensor usually shows a small 
(resp. big) increase, then 

a small (resp. big) decrease, and finally a 
small (resp. big) increase. There is not al- 
ways variation in the roll sensor. However, 
there is a gradual, small (resp. big) increase 



followed by a gradual, small (resp. big) de- 
crease on all (or almost all) the six wheel 
current sensors. 



5.2.2 The model to represent each situa- 
tion 




Figure 10: Topology of states used for each 
model of situation 

In the formalism described in section [21 each 
situation to be recognized is modeled by a 
HMM2 whose topology is depicted in figure E3 
This topology is well suited for the type of recog- 
nition we want to perform. In this experiment, 
each model has five states to model the succes- 
sive events characterizing a particular situation. 
This choice has been experimentally shown to 
give the best rate of recognition. 

5.2.3 Corpus collecting and labeling 

We built six corpora to train a model for each 
situation. For this, our mobile robot made ap- 
proximately fifty runs in the sandbox. For each 
run, the robot received one discrete translation 
command ranging from three meters to twenty 
meters. Rotation motions are not part of the 
corpus. Each run contains different situations, 
but each run is unique (i.e., the area traversed 
and the sequence of situations during the run is 
different each time). A run contains not only 
one situation but all the situations seen while 
running. For each run, we noted the situations 
seen during the run, for later segmentation and 
labeling purposes. 
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Figure 11: Segmentation and labeling of a run. 
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The rules denned to characterize a situation 
are used to segment and label each run. An ex- 
ample of segmentation and labeling is given in 
figure im The sensors are in the following order 
(from the top): roll, pitch, the three left wheel 
currents, and the three right wheel currents. A 
vertical line marks the beginning or the end of a 
situation. The default situation alternates with 
the other situations. The sequence of situations 
in the figure is the following (as labeled in the 
figure): small rock on the left side, default situa- 
tion, big rock on the right side, default situation, 
small hill, default situation, and big hill. 

5.2.4 Model training 

In this experiment, we do not need to interpolate 
the observations done by the robot, because it 
always moves at approximately the same trans- 
lation speed. As we want to compare different 
possibilities and test if the detection is usable 
even if some sensors break down, we train a sep- 
arate model for each of three sets of input data. 
The observations used as input of each model to 
train consist of: 

• eight coefficients: the first derivative (i.e., 
the variation) of the values of the eight sen- 
sors used for segmentation. 

• six coefficients: the first derivative (i.e., the 
variation) of the values of the six wheel cur- 
rent sensors. 

• two coefficients: the first derivative (i.e., the 
variation) of the values of the roll and the 
pitch sensors. 

Each training uses segmented data, and each 
model is trained independently with its corpus. 

There are two reasons for training three differ- 
ent models. First is to check whether the eight 



sensors used for the segmentation are necessary 
to learn and recognize situations, or whether a 
subset is sufficient. Second, we want to be able 
to recognize situations even if one or more sen- 
sors do not work; e.g., if some wheel sensors do 
not work it will affect (during recognition) the 
models using the six wheel current sensors or the 
eight sensors but not the models using just the 
roll and pitch sensors. 

5.2.5 The recognition phase 

The goal of recognition is to identify the five situ- 
ations (small rock on the left or right; big rock on 
the left; small or big hill) while the robot moves 
in the sandbox. The default situation model con- 
nects two items much like silence between two 
words in speech recognition. 

During the recognition phase, the robot was 
operated as for corpus collecting. We took ap- 
proximately 40 acquisitions and used the six 
trained models to perform the recognition. We 
perform three independent recognitions, corre- 
sponding to the three learning situations. 

5.2.6 Results and discussion 

In each confusion matrix, the acronyms used are: 
BL = big rock on the left, SL = small rock on 
the left, SR = small rock on the right, BH = big 
hill, and SH = small hill. The results of the 
three independent experiments are shown and 
analyzed in the three next subsections. In the 
fourth subsection, we present a global analysis 
of the results. 

Experiment with eight sensors For eight 
sensors, as each situation can be easily distin- 
guished from the others, the global rate of recog- 
nition is excellent (87%) (see tables 01 EJ). Small 
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Table 3: Confusion matrix of situations, eight 
sensors. 





number 


% 


Seen 


135 


100 


Recognized 


118 


87 


Substituted 


15 


11 


Deleted 


2 


2 


Inserted 


90 


67 



Table 4: Global rate of recognition, eight sen- 
sors. 

(resp. big) rocks on the left are sometimes con- 
fused with big (resp. small) rocks on the left; 
the signal provided by the sensors does not con- 
tain the information necessary to discriminate 
these two models. In fact, the variations on the 
sensors are nearly the same. The only criterion 
which distinguishes these two models is the am- 
plitude of the variation on the three left wheels, 
and visibly it is not sufficient. The small rocks 
on the right are perfectly recognized. This sit- 
uation has a very distinctive pattern, and only 
with difficulty can it be confused with another. 
The fact that we could not learn and recognize a 
situation where the robot is crossing a big rock 
on its right avoids any confusion. 

The major problem is the high rate of inser- 
tion. This rate is due to the noise of the sensors 



being recognized as a situation. This is espe- 
cially the case for situations characterized only 
by small variations on a part (or all) of the set 
of sensors, in particular the crossing of a small 
hill. 
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Table 5: Confusion matrix of situations, six sen- 
sors. 
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Table 6: Global rate of recognition, six sensors. 

Experiment with six sensors With six sen- 
sors, the global rate of recognition is still very 
good (see tables EJ) • There is only four more 
percent of substitutions due to the loss of infor- 
mation used to distinguish situations. On the 
other hand, the rate of insertion increased by 
25%. With only the six wheel current sensors, 
nearly one recognition out of two is an insertion. 
The six wheel current sensors are very noisy, and 
the roll and pitch sensors are useful to distin- 
guish between simple noise and real situations. 
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This explains the increase of the insertions. 
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Table 7: Confusion matrix of situations, two sen- 
sors. 
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Table 8: Global rate of recognition, two sensors. 

Experiment with two sensors With only 
the roll and pitch sensors, the global rate of 
recognition remains good, and the rate of inser- 
tions significantly decreases (see tables EJ • In 
fact, these two sensors are not too noisy, and 
when there is a variation on these sensors it gen- 
erally corresponds to a real situation. But these 
two sensors do not provide sufficient information 
to distinguish between situations, which is why 
there is a high rate of substitution. 

Global analysis From the results of experi- 
ments, we can draw some conclusions. The best 
way to perform recognition is with eight sen- 
sors: the rate of recognition is a little bit bet- 



ter than for six sensors and the rate of inser- 
tion is very smaller. This can be explained by 
the fact that the six wheels current sensors are 
very noisy, and the use of the roll and pitch sen- 
sors, which are not too noisy, can distinguish be- 
tween a situation to recognize and a simple noise 
on the current wheel sensors. Nonetheless, the 
models learned in the two last experiments could 
be useful in long exploration where sensors can 
fail, since they provide usable, albeit less reliable, 
recognition. 

This experiment can be extended to fault de- 
tection, for example broken wheels or sensor fail- 
ure. In fact, we can build one model of a par- 
ticular situation where all sensors work and sev- 
eral models of this situation where one or several 
sensors are broken: for example a model of a big 
rock on the right side and a model of a big rock 
on the right when the front left wheel is broken. 
Using these models, we can recognize situations 
associated with the state of the sensors of the 
robot, and detect failing of sensors or motors. 

6 Related work 

A variety of approaches to state estimation and 
fault diagnosis have been proposed in the con- 
trol systems, artificial intelligence, and robotics 
literature. 

Techniques for state estimation of continuous 
values, such as Kalman filters, can track multiple 
possible hypotheses [201125 • However, they must 
be given an a priori model of each possible state. 
One of the strengths of the approach presented 
in this paper is its ability to construct models 
from training data and then use them for state 
identification. 

Qualitative model-based diagnosis techniques 
El consider a snapshot of the system rather 
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than its history. In addition, the system state 
is assumed to be consistent with a propositional 
description of one of a set of possible states. The 
presence of noisy data and temporal patterns 
negates these assumptions. 

Hidden Markov Models have been applied to 
fault detection in continuous processes [2J; the 
model structure is supplied, with only the tran- 
sition probabilities learned from data. In the ap- 
proach in this paper, the HMM learns without 
prior knowledge of the models. 

Markov models have been widely used in mo- 
bile robotics. Thrun [22] reviews techniques 
based on Markov models for three main prob- 
lems in mobile robotics: localization, map build- 
ing and control. In these techniques, a Markov 
model represents the environment, and a specific 
algorithm is used to solve the problem. Our ap- 
proach is different in a number of ways. We ad- 
dress a different problem: the interpretation of 
temporal sequences of mobile-robot sensor data 
to automatically detect features. Moreover, we 
use very little a priori knowledge: in particu- 
lar, the topology of the model reflecting the hu- 
man's understanding about sequences of sensor 
data characterizing a particular feature. All the 
other parameters of the model are estimated by 
learning. On the contrary, the techniques pre- 
sented in |2H] need some preliminary knowledge: 
a map of the environment, a sensor model and 
an actuator model. Usually, there is no learning 
component in these techniques. 

The most well-known work including a learn- 
ing component is by Koenig and Simmons |12j . 
They start with an a priori topological map 
that is translated into a Markov model before 
any navigation takes place. An extension of the 
Baum- Welch algorithm reestimates the Markov 
model representing the environment, the sensor 
and actuator models. There are a number of dif- 



ferences with this work: 

• They use a Markov model to model the en- 
vironment, whereas we use a Markov model 
to model the sequence of events composing 
a particular feature; 

• They need some a priori knowledge: a topo- 
logical map of the environment, and sensor 
and actuator models; 

• They make hypotheses on the value of some 
parameters to reduce the number of param- 
eters to estimate; we do not make any such 
hypothesis; 

• The observations they use are discrete, sym- 
bolic and unidimensional. There are ob- 
tained by an abstraction (based on some 
hypothesis) of the raw data of several sen- 
sors. Discrete symbolic and unidimensional 
observations are the result of our method. 
They are obtained by interpretation of a se- 
quence of raw data from several sensor with- 
out any prior hypothesis. 

Our work can be seen as a preliminary step for all 
of the work presented in [23] . We have previously 
built a sensor model based on the recognition 
rates reported in this article; the model allowed 
robust localization in dynamic environments |2j. 

Hidden Markov Models have been used for in- 
terpretation of temporal sequences in robotics 
^3^D- The approach presented in this paper 
is more robust for the following reasons: 

• Yang, Xu, and Chen make some restric- 
tions and hypotheses on the observations 
they used: each component of the obser- 
vation is discretized, since he uses a HMM 
with discrete observations. Moreover, each 
component of the observation is presumed 
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independent from the other. In our work, 
the probability of an observation given a 
particular state is represented by a mixture 
of Gaussians. Thus we are able to deal with 
observations constituted by noisy continu- 
ous data of different types 4 of sensors with- 
out any a priori assumption about the in- 
dependence of these data and without any 
discretization of the data; 

• The particular approach we use is the 
second-order HMM (HMM2). HMM2s have 
been shown to be effective models to capture 
temporal variations in speech fT?i, in many 
cases surpassing first-order HMMs when the 
trajectory in the state space has to be ac- 
counted for. For instance in the first ex- 
periment, due to the duration-modeling ca- 
pabilities of HMM2, the Viterbi algorithm 
was able to distinguish an open door from a 
T-intersection. 

7 Conclusion and future direc- 
tions 

In this paper, we have presented a new method 
to learn to automatically detect features for mo- 
bile robots using second-order Hidden Markov 
Models. This method gives very good results, 
and has a good robustness to noise, verifying 
that HMM2s are well suited for this task. We 
showed that the process of recognition is robust 
to dynamic environment. Features are detected 
even if they are quite different from learned 
features: for instance, an open door is recog- 
nized even if it is completely or partially opened. 
Moreover, features are detected even if they are 

4 In the second experiment, the observation is com- 
posed of three types of sensors. 



seen from a different point of view. For instance, 
in contrast to Kortenkamp et al ^3] , features are 
detected even if the robot is not at a given dis- 
tance from a wall and doesn't move in a direc- 
tion perfectly parallel to the two walls constitut- 
ing the corridor. Finally, our approach has been 
successfully tested in an outdoor environment. 

The results can be improved by adding more 
models to decrease the intra-class variability (es- 
pecially for open doors across from each other) 
and to take into account contextual information. 
Another criterion that could improve the results 
is to choose a different number of states for each 
feature. 

Moreover, the method takes advantage of ana- 
lytical methods and pattern classification meth- 
ods. First, we analyze the sensor data and de- 
fine a model to represent the patterns in the 
data. Secondly, the learning algorithm automat- 
ically adjusts the parameters of the model using 
a learning corpus. Moreover, the learning algo- 
rithm was able to extract more complex char- 
acteristics of a feature than simple variations of 
sensor data between two consecutive moments. 
For instance: 

• The length of a sequence 5 of observations 
was taken into account in the first experi- 
ment to detect the difference between a T- 
intersection and an open door; 

• In the first experiment, the gradual decrease 
(resp. increase) of the value of sensors lo- 
cated in front (resp. in the rear) of the robot 
during time has been used to characterize a 
start (resp. an end) of corridor; 

• The algorithm can find correlation between 
data from sensors of different types to char- 

the number of observations composing the sequence 
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acterize a feature. For example, the corre- 
lation of the roll, pitch and wheel current 
sensors is used to characterize a situation in 
the second experiment. 

However, our method has two drawbacks: 

• As in Kortenkamp et al ^Sl) a feature can 
only be recognized when it has been com- 
pletely visited. For example, the robot 
would have to go back to turn at a T- 
intersection after it had recognized it. 

• Moreover, using the current technique, the 
list of places is known only when the run 
has been completed. To detect features on- 
line during navigation, we can use a vari- 
ant of the Viterbi algorithm called Viterbi- 
block |14j . This algorithm is based on 
a local optimum comparison of the differ- 
ent probabilities computed by the Viterbi 
algorithm during time-warping of a shift- 
window of fixed length in the signal and the 
different HMMs. This algorithm can de- 
tect features a few meters after they have 
been seen. We have used this algorithm 
to perform localization in dynamic environ- 
ment |2J. 
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